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‘with  the  semantic  and  pragmatic  processing  modules.  In  this  paper  we  propose 
some  methods  fotr  dealing  with  syntactic  ambiguity  in  ways  that  take  advantage 
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will  be  expressed  as  linear  combinations  of  ATN  networks,  and  also  as  sums  and 
products  of  formal  power  series. 

We  will  suggest:  some  ways  that  practical  processor  can  take  advantage  of  this 
modularity  in  order  to  deal  more  efficiently  with  combinatoric  ambiguity.  In 
particular,  we  Will  show  how  a  processor  can  efficiently  compute  the  ambiguity 
of  an  input  sentence  (or  any  portion  thereof).  Furthermore,  we  will  show  how 
to  compile  certain  grammers  into  a  form  that  can  be  processed  more  efficientl> 
In  some  cases,  including  the  /’every  way  ambiguous^  grammer  (e.g.,  conjunction 
prepositional  pjwases,  noun-noun  modification),  processing  time  will  be 
reduced  from  0  (n^)  to  0  (n) .  Finally,  we  will  show  how  to  uncompile  certain 
highly  optimized  grammars  into  a  form  suitable  for  linguistic  analysis. 
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Abstract 

Sentences  are  far  more  ambiguous  than  one  might  have  thought.  Ilicre  may  he  hundreds,  perhaps 
thousands  of  syntactic  parse  trees  for  certain  very  natural  sentences  of  English,  'litis  fact  has  been  a  majotll 

T 

problem  confronting  natural  language  processing  because  it  indicates  that  it  may  require  a  long  time  to 
construct  a  list  of  all  the  parse  trees,  and  furthermore,  it  isn’t  clear  what  to  do  with  the  list  once  it  has  been 
constructed.  This  list  may  be  so  numerous  that  it  is  probably  not  the  most  convenient  representation  for 
communication  with  the  semantic  and  pragmatic  processing  modules.  In  this  paper  we  propose  some 
methods  for  dealing  with  syntactic  ambiguity  in  ways  that  take  advantage  of  certain  regularities  among  the 
alternative  parse  trees.  These  regularities  will  be  expressed  as  linear  combinations  of  ATN  networks,  and  also 
as  sums  and  products  of  formal  power  series. 

We  will  suggest  some  ways  that  practical  processor  can  take  advantage  of  this  modularity  in  order  to 
deal  more  efficiently  with  combinatoric  ambiguity.  In  particular,  we  will  show  how  a  processor  can  efficiently 
compute  the  ambiguity  of  an  input  sentence  (or  any  portion  thereof).  Furthermore,  we  will  show  how  to 
compile  certain  grammars  into  a  form  that  can  be  processed  more  efficiently.  In  some  eases,  including  the 
“every  way  ambiguous”  grammar  (c.g.,  conjunction,  prepositional  phrases,  noun-noun  modification), 
processing  time  will  be  reduced  from  0(n3)  to  0(n).  Finally,  we  will  show  how  to  uncompile  certain  highly 
optimized  grammars  into  a  form  suitable  for  linguistic  analysis. 
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Most  parsers  find  the  set  of  parse  trees  by  starling  with  tire  empty  set  and  adding  to  it  cadi  time  they 
find  a  new  possibility.  We  make  tire  observation  that  in  certain  situations  it  would  he  much  more  efficient  to 
work  in  the  other  direction,  starting  from  the  universal  set  (i.c.,  the  set  of  all  binary  trees)  and  ruling  trees  out 
when  the  parser  decides  that  they  cannot  be  parses.  Ruling-out  is  easier  when  the  set  of  parse  trees  is  closer  to 
the  universal  set  and  ruling-in  is  easier  when  the  set  of  parse  trees  is  closer  to  the  empty  set.  Ruling-out  is 
particularly  suited  for  “every  wav  ambiguous”  constructions  such  as  prepositional  phrases  which  have  just  as 
many  parse  trees  as  there  are  binary  trees  over  the  terminal  elements.  Since  every  tree  is  a  parse,  die  parser 
doesn’t  have  to  rule  any  of  them  out. 

In  some  sense,  this  is  a  formalization  of  an  idea  that  has  been  in  die  literature  for  some  time.  ITtat  is,  it 
has  been  noticed  for  a  long  time  that  these  sorts  of  vet  j  ambiguous  constructions  are  very  difficult  for  most 
parsing  algorithms,  but  (apparently)  not  for  people.  This  observation  has  led  some  researchers  to  hypothesize 
additional  parsing  mechanisms,  such  as  pseudo-attachment  [1:  pp.  65-71]1  and  permanent  predictable 
ambiguity  [14:  pp.  64-65],  so  that  the  parser  could  “attach  all  ways”  in  a  single  step.  However,  these 
mechanisms  have  always  lacked  a  precise  interpretation;  we  will  present  a  much  more  formal  way  of  coping 
with  “every  way  ambiguous”  grammars,  defined  in  terms  of  Catalan  numbers  [ 8:  pp.  388-389,  pp.  531-533], 

Certain  constructions,  including  the  “every  way  ambiguous"  grammar,  will  be  treated  as  primitive 
objects  (modules)  which  can  be  combined  in  various  ways  to  produce  composite  constructions  such  as  lexical 
ambiguity  which  are  also  very  ambiguous,  but  not  quite  “every  way  ambiguous”.  Composite  constructions 
will  be  analyzed  as  linear  combinations  of  primitive  components,  in  a  sense  to  be  made  precise  in  terms  of 
formal  power  series.  Fqui  alcntly,  in  ATN  notation,  composite  networks  can  be  analyzed  as  scries  and 
parallel  combinations  of  primitive  networks.  This  approach  has  been  strongly  influenced  by  Umiar  systems 
theory,  a  classic  engineering  notion  of  modularity. 

We  will  suggest  some  ways  that  practical  processor  can  take  advantage  of  this  modularity  in  order  to 
deal  more  efficiently  with  combinatoric  ambiguity.  In  particular,  we  will  show  how  a  processor  can  efficiently 
compute  the  ambiguity  of  an  input  sentence  (or  any  portion  thereof).  Furthermore,  we  will  show  how  to 
compile  certain  grammars  into  a  form  that  can  be  processed  more  efficiently.  In  some  cases,  including  the 
“every  way  ambiguous  grammar”,  processing  time  will  be  reduced  from  0(n3)  to  O(n).  Finally,  we  will  show 
how  to  uncompile  certain  highly  optimized  grammars  into  a  form  suitable  for  linguistic  analysis. 


I.  The  idea  of  pseudo-attachment  was  first  premised  by  Marcus  (private  communication),  though  Marcus  does  not  accept  the 
formulation  in  |t). 
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Section  1 


1.  Ambiguity  is  a  Practical  Problem 

Sentences  are  far  more  ambiguous  than  one  might  have  thought.  There  may  be  hundreds,  perhaps 
thousands  of  syntactic  parse  trees  for  certain  very  natural  sentences  of  Knglish.  For  example,  consider  the 
following  sentence  with  two  prepositional  phrases: 

(1)  Put  the  block  in  the  box  on  the  table, 
which  has  two  interpretations: 

(2a)  Put  the  block  [in  the  box  on  the  table]. 

(2b)  Put  [the  block  in  the  box]  on  the  table. 

'Ihcsc  syntactic  ambiguities  grow  “combinatorially”  with  the  number  of  prepositional  phrases.  For  example, 
when  a  third  PP  is  added  to  the  sentence  above,  there  are  five  interpretations: 

(3a)  Put  the  block  [[in  the  box  on  the  table]  in  the  kitchen]. 

(3b)  Put  the  block  [in  the  box  [on  the  table  in  die  kitchen]]. 

(3c)  Put  [[the  block  in  the  box]  on  the  table]  in  the  kitchen. 

(3d)  Put  [the  block  [in  the  box  on  the  table]]  in  the  kitchen. 

(3e)  Put  [the  block  in  the  box]  [on  the  table  in  the  kitchen]. 

When  a  fourth  PP  is  added,  there  arc  fourteen  trees,  and  so  on.  This  sort  of  combinatoric  ambiguity  has  been 
a  major  problem  confronting  natural  language  processing  because  it  indicates  that  it  may  require  a  long  time 
to  construct  a  list  of  all  the  parse  trees,  and  furthermore,  it  isn’t  clear  what  to  do  with  the  list  once  it  has  been 
constructed.  This  list  may  be  so  numerous  that  it  is  probably  not  the  most  convenient  representation  for 
communication  with  the  semantic  and  pragmatic  processing  modules.  In  this  paper  we  propose  some 
methods  for  dealing  with  syntactic  ambiguity  in  ways  that  take  advantage  of  certain  regularities  among  the 
alternative  parse  trees. 

In  particular,  we  observe  that  enumerating  the  parse  trees  as  above  misses  the  very  important 
generalization  that  prepositional  phrases  are  “every  way  ambiguous”,  or  more  precisely  the  set  of  parse  trees 
over  i  pps  is  the  same  as  the  set  of  binary  trees  that  can  be  constructed  over  r  terminal  elements.  Notice,  for 
example,  that  there  arc  two  possible  binary  trees  over  three  elements. 


(4a)  [ ...  block  ...  [ ...  box  ...  table ...  ]] 

(4b)  [[ ...  block  ...  box ...] ...  table  ...  ] 
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corresponding  to  (2a)  and  (2b)  respectively,  and  that  there  arc  five  binary  trees  over  four  elements 
corresponding  to  (3a)-(3e)  respectively  (see  figure  1). 

Ihese  “worst  ease”  scenarios  occur  very  often  in  practice,  as  indicated  by  our  experience  with  the  iqsp 
parser  [11J  on  the  Malhotra  Corpus  [10).2  Almost  2%  of  die  Malhotra  Corpus  has  300  or  more  interpretations 
according  to  l-QSl'.  The  sentences  arc  given  below  with  the  number  of  parse  trees.  Note  that  the  first  sentence 
is  almost  a  thousand  ways  ambiguous. 

958  In  as  much  as  allocating  costs  is  a  tough  job  1  would  like  to  have  the  total  costs  related  to  each 
product 

692  For  each  plant  give  the  ratio  of  1973  to  1972  figures  for  each  type  of  production  cost  and 
overhead  cost. 

654  Do  you  have  a  model  to  maximize  contribution  to  the  company  subject  to  production  and  other 
constraints? 

556  Give  actual  and  budgeted  operating  costs  for  all  plants,  and  actual  and  budgeted  i  lanagemcnt 
salaries  and  interest  costs. 

512  Give  me  a  breakdown  of  difference  between  list  and  average  quoted  price  for  each  product  for 
1972  and  1973. 

510  ITic  intent  of  my  question  is  to  find  out  if  you  know  if  your  accounting  methods  can  relate  the 
changes  in  sales  to  changes  in  your  expense  structures. 

322  Display  the  difference  between  list  price  and  actual  costs  (direct  +  overhead)  divided  by  list 
price  for  plant  2  for  the  past  four  years. 

382  What  was  the  number  of  units  of  product  2  produced  at  plant  2  in  1973  times  the  unit  price  of 
product  2? 

These  sentences  show  that  syntactic  constraints  are  not  always  very  restrictive.  This  fact  has  been  a 
major  problem  confronting  natural  language  processing  because  it  indicates  that  it  may  require  a  long  time  to 
construct  a  list  of  all  the  parse  trees,  and  furthermore,  it  isn’t  clear  what  to  do  with  the  list  once  it  has  been 
constructed.  The  list  of  parse  trees  can  be  so  numerous  that  it  is  probably  not  the  most  efficient  repre¬ 
sentation  for  communication  with  the  semantic  and  pragmatic  processing  modules.  A  list  representation  fails 
to  take  advantage  of  certain  generalizations  among  the  alternative  parse  trees,  especially  the  “every  way 
ambiguous"  generalization. 


2  Malhotra  gathered  approximately  500  sentences  in  an  experiment  which  fooled  businessmen  into  believing  that  they  were  interacting 
with  a  computer  when  they  were  actually  communicating  with  a  person  in  an  another  room 
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The  “every  way  ambiguous”  generalization  is  missed  by  most  parsing  algorithms  currently  in  practice 
including  our  own  1QSP.  lliesc  algorithms  all  construct  the  set  of  possible  parse  trees  by  starting  from  die 
empty  set  and  adding  to  it  each  time  they  find  a  new  set  of  analyses.  We  make  the  observation  that  there  arc 
certain  situations  where  it  would  be  much  more  efficient  to  work  in  the  other  direction,  starting  from  the 
universal  set  and  ruling  trees  out  when  the  parser  decides  that  they  cannot  be  parses.  Ruling-out  is  easier 
when  the  set  of  parse  trees  is  closer  to  the  universal  set  and  ruling-in  is  easier  when  die  set  of  parse  trees  is 
closer  to  die  empty  set  Ruling-out  is  particularly  suited  for  “every  way  ambiguous”  grammars  like  pi's 
because  there  arc  no  trees  to  exclude.  Similar  comments  hold  for  other  “every  way  ambiguous”  constructions 
such  as  adjuncts,  conjuncts,  noun-noun  modification,  and  stacked  relative  clauses. 

Ihcsc  constructions,  which  will  be  treated  as  primitive  objects,  can  be  combined  in  various  ways  to 
produce  composite  constructions  such  as  lexical  ambiguity  which  may  also  be  very  ambiguous,  but  not 
necessarily  “every  way  ambiguous”.  Composite  constructions  can  be  analyzed  as  linear  combinations  of 
primitive  components.  Lexical  ambiguity,  for  example,  will  be  analyzed  as  die  sum  of  its  senses,  or  in  flow 
graph  terminology  [13],  as  a  parallel  connection  of  its  senses.  Structural  ambiguity,  on  the  other  hand,  w  ill  be 
analyzed  as  the  product  of  its  components,  or  in  flow  graph  terminology  as  a  series  connection.  For  example, 
the  sentence 

(5)  Was  die  block  in  the  box  on  the  table? 

is  structurally  ambiguous.  The  “box”  can  be  associated  with  cither  the  “block”  or  the  “table”.  We  will 
analyze  this  sentence  as  a  product  of  two  polynomials,  the  first  corrcs,  onding  to  die  subject  noun  phrase  and 
the  second  corresponding  to  the  complement  noun  phrase.  'ITic  standard  definition  of  polynomial 
multiplication  correctly  accounts  for  the  two  possible  attachments  of  “box”.  We  prefer  this  linear  systems 
view  to  heuristic  search  strategics  (c.g.  [6]),  because  linear  systems  can  capture  generalizations  diat  hold  across 
alternative  interpretations,  whereas  search  strategics  tend  to  probe  only  a  single  interpretation  (context)  at  a 
time.  At  the  very  least,  our  approach  is  an  improvement  over  enumerating  each  tree  individually,  which 
consumes  exponential  time  in  the  worst  case. 

2.  Formal  Power  Series 

This  section  will  make  the  linear  systems  analogy  more  precise  by  relating  context-free  grammars  to 
formal  power  series  (polynomials).  Formal  power  scries  are  a  well-known  device  in  the  formal  language 
literature  (e.g.  [15])  for  developing  the  algebraic  properties  of  context-free  grammars.  We  introduce  diem 
here  to  establish  a  fonnal  basis  for  our  upcoming  discussion  of  processing  issues. 
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The  power  scries  for  grammar  (6a)  is  (6b). 

(6a)  NP  -♦  John  |  np  and  np 

(6b)  NP  =  John  +  John  and  John  T  2  John  and  John  and  John 
+  5  John  and  John  and  John  and  John 
+  14  John  and  John  and  John  and  John  and  John  +  ... 

Kach  tenn  consists  of  a  sentence  generated  by  the  grammar  and  an  ambiguity  coefficient3  which  counts  how 
many  ways  the  sentence  can  be  generated.  For  example,  the  sentence  “John”  has  one  parse  tree 

(7a)  [John]  I  tree 

because  the  /cro-th  coefficient  of  die  power  series  is  one.  Similarly,  the  sentence  “John  and  John”  also  has 
one  tree  because  its  coefficient  is  also  one, 

(7b)  [John  and  John]  /  tree 

and  “John  and  John  and  John”  has  two  because  its  coefficient  is  two, 

(7c)  [[John  and  John]  and  John],  [John  and  [John  and  John]]  2  trees 

and  “John  and  John  and  John  and  John"  has  five, 

(7d)  [John  and  [[John  and  John]  and  John]],  [John  and  [John  and  [John  and  John]]],  5  trees 

[[[John  and  John]  and  John]  and  John],  {[John  and  [John  and  John]]  and  John], 

[[John  and  John]  and  [John  and  John]] 

and  so  on.  The  reader  can  verify  for  himself  that  “John  and  John  and  John  and  John  and  John”  has  fourteen 
trees. 

Note  that  the  power  series  encapsulates  the  ambiguity  response  of  the  system  (grammar)  to  all  possible 
input  sentences.  In  this  way,  the  power  series  is  analogous  to  the  impulse  response  in  electrical  engineering, 
which  encapsulates  the  response  of  tire  system  (circuit)  to  all  possible  input  frequencies.  (Ambiguity 
coefficients  bear  a  strong  resemblance  to  frequency  coefficients  in  Fourier  analysis.)  All  of  these  transformed 
representation  systems  (c.g.,  power  scries,  impulse  response,  and  Fourier  series)  provide  a  complete 


3.  The  formal  language  literature  [5. 15)  uses  the  term  sunnort  instead  of  ambiguity  coefficient 
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description  of  the  system  with  no  loss  of  information4  (and  no  heuristic  approximations  (c.g.,  search  strategics 
[(>))).  Transforms  are  often  very  useful  bee  uisc  they  provide  a  different  point  of  view.  Certain  observations 
arc  more  easily  seen  in  the  transform  space  than  in  the  original  space,  and  vice  versa. 

This  paper  will  discuss  several  ways  to  generate  the  power  series.  Initially  let  us  consider  successive 
approximation.  Of  all  the  techniques  to  be  presented  here,  successive  approximation  most  closely  resembles 
the  approach  taken  by  most  current  chart  parsers  including  FQSlb  The  alternative  approaches  take  advantage 
of  certain  regularities  in  tire  power  scries  in  order  to  produce  the  same  results  more  efficiently. 

Successive  approximation  works  as  follows.  First  wc  translate  grammar  (6a)  into  the  equation 
(8)  np  =  John  +  NP  •  and  •  NP 

where  “  +  ”  connects  two  ways  of  generating  an  NP  and  concatenates  two  parts  of  an  np.  In  some  sense,  we 
want  to  “solve”  this  equation  for  NP.  'litis  can  be  accomplished  by  refining  successive  approximations.  An 
initial  approximation  Nt>0  is  formed  by  taking  NP  to  be  the  empty  language. 

(9a)  NPQ  =  0 

Then  wc  form  the  next  approximation  by  substituting  the  previous  approximation  into  equation  (8),  and 
simplifying  according  to  the  usual  rules  of  algebra  (e.g.  assuming  distributivity.  associativity,5  identity 
clement  and  zero  clement). 

(9b)  NPX  =  John  +  nPq  •  and  •  npq  =  John  +  0  •  and  •  0  =  John 
Wc  continue  refining  die  approximation  in  this  way. 

(9c)  np2  =  John  +  NPj  •  and  •  NPj  =  John  +  John  and  John 


4  This  needs  a  qualification,  ll  is  true  that  the  power  scries  provides  a  complete  description  of  the  ambiguity  response  to  any  input 
sentence  However,  the  power  series  representation  may  be  losing  some  information  that  would  be  useful  for  parsing  In  paiticular, 
there  might  be  some  cases  where  it  is  impossible  to  recover  the  parse  trees  exactly  as  we  will  see,  though  this  may  not  be  too  serious  a 
problem  for  many  practical  applications.  That  is,  it  is  often  possible  to  recover  most  (if  not  all)  of  the  structure,  which  may  be  adequate 
for  many  applications. 

5  The  careful  reader  may  correctly  object  to  this  assumption  Wc  include  it  here  for  expository  convenience,  as  it  grcatlv  simplifies  the 
derivations  though  it  should  be  noted  that  many  of  the  results  could  be  derived  without  the  assumption  Furthermore,  this  assumption  is 
valid  for  counting  ambiguity  I  hat  is.  |A  •  Hj  *  |CI  =  |A|  *  |B  •  C|.  where  A.  B  and  C  are  sets  of  trees  and  |A|  denotes  the  number  of 
members  of  A,  and  *  is  integer  multiplication. 
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(9d)  np3  -  John  +  Nl'2  and  np2 

=  John  +  (John  +  John  and  John)  •  and  ■  (John  f  John  and  John) 

=  John  +  John  and  John  +  John  and  John  and  John  +  John  and  John  and  John 
+  John  and  John  and  John  and  John 
=  John  +  John  and  John  +  2  John  and  John  and  John 
+  John  and  John  and  John  and  John 


Eventually,  wc  have  NP  expressed  as  an  infinitely  long  polynomial  (6b)  above,  litis  expression  can  be 
simplified  by  introducing  a  notation  for  exponentiation.  Let  x'  be  an  abbreviation  for  multiplying  x  •  x  • ...  • 
x,  i  times. 

(10)  np  =  John  +  John  and  John  f  2  John  (and  John)2 

+  5  John  (and  John)3  +  14  John  (and  John)4  +  ... 

Note  that  parentheses  arc  interpreted  differently  in  algebraic  equations  than  in  context-free  rules.  In  context- 
free  rules,  parentheses  denote  optionality,  whereas  in  equations  they  denote  precedence  relations  among 
algebraic  operations. 

3.  Catalan  Numbers 

Ambiguity  coefficients  take  on  an  important  practical  significance  when  we  can  model  them  directly 
without  resorting  to  successive  approximation  as  above.  This  can  result  in  substantial  time  and  space  savings 
in  certain  special  cases  where  there  arc  much  more  efficient  ways  to  compute  the  coefficients  than  successive 
approximation  (chart  parsing).  Equation  (10)  is  such  a  special  case;  the  coefficients  follow  a  well-known 
combinatoric  series  called  the  Catalan  Numbers  (8:  pp.  388-389,  pp.  531-533].6  This  section  will  describe 
Catalan  numbers  and  their  relation  to  parsing. 

The  first  few  Catalan  numbers  are:  1, 1,  2,  5, 14, 42,  132, 469, 1430, 4862, ...  They  are  generated  by  the 
closed  form  expression:7 

<n>  c*„  =  (3n") -(»-.) 


6.  This  fact  was  first  pointed  out  to  us  by  V.  Pratt  We  suspect  that  it  is  a  generally  well-known  result  in  the  formal  language 
community,  though  its  origin  is  unclear. 

7.  (jj)  is  known  as  a  binomial  coefficient.  It  is  equivalent  to  where  a!  is  equal  to  the  product  of  all  integers  between  1  and  a 

Binomial  coefficients  arc  very  common  in  combinatorics  where  they  are  interpreted  as  the  number  of  '.ays  to  pick  b  objects  out  of  a  set 
of  a  objects. 
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'Ibis  formula  can  be  explained  in  terms  of  parentliesized  expressions,  which  arc  equivalent  to  trees.  Cat  is 
the  number  of  ways  to  parenthesize  a  formula  of  length  n.  There  arc  two  conditions  on  parcntlu  si/ation:  (a) 
dierc  must  the  same  number  of  open  and  close  parentheses,  and  (b)  they  must  be  properly  nested  so  that  an 
open  parenthesis  precedes  its  matching  close  parenthesis.  The  first  term  counts  the  number  of  sequences  of 
2n  parentheses,  such  that  there  arc  the  same  number  of  opens  and  closes.  The  second  term  subtracts  out  cases 
violating  condition  (b).  This  explanation  is  elaborated  in  [8:  p.  531]. 

It  is  very  useful  to  know  that  the  ambiguity  coefficients  arc  Catalan  numbers  because  this  observation 
enables  us  to  replace  equation  (10)  with  (12),  where  Cat^  denotes  the  i,h  Catalan  number.  (Ail  summations 
range  from  0  to  oo  unless  noted  otherwise.) 

(12)  NP  =  2  Cat  j  John  (and  John)’ 
i 

The  ilh  Catalan  number  is  die  number  of  binary  trees  that  can  be  constructed  over  i  phrases.  This  model 
correctly  predicts  PQSP's  behavior  with  prepositional  phrases.  That  is,  (lie  P.QSP  parser  [11]  found  exactly  the 
Catalan  number  of  parse  trees  for  each  sentence  in  die  following  sequence: 

1  It  was  the  number. 

1  It  was  die  number  of  products. 

2  It  was  the  number  of  products  of  products. 

5  It  was  the  number  of  products  of  products  of  products. 

14  It  was  the  number  of  products  of  products  of  products  of  products. 


These  predictions  condnue  to  hold  with  as  many  as  nine  prepositional  phrases  (4862  parse  trees). 

4.  Table  Lookup 

We  could  improve  EQSP’s  performance  on  pps  if  we  could  find  a  more  efficient  way  to  compute  Catalan 
numbers  than  chart  parsing,  die  method  currently  employed  by  PQSP.  Let  us  propose  two  alternatives:  table 
lookup  and  evaluating  expression  (11)  directly.  Both  arc  very  efficient  over  practical  ranges  of  say  no  more 
than  20  phrases  or  so.8  In  both  cases,  the  ambiguity  of  a  sentence  in  grammar  (6a)  can  be  determined  by- 
counting  die  number  of  occurrences  of  “and  John”  and  then  retrieving  die  Catalan  of  that  number.  These 


8  The  tabic  lookup  scheme  ought  to  have  a  wav  to  handle  the  theoretical  possibility  that  there  are  an  unlimited  number  of  prepositional 
phrases  The  table  lookup  routine  will  employ  a  ntorc  traditional  parsing  algorithm  (e  g.,  Parley's  Algorithm)  when  'he  number  of 
phrases  in  the  input  sentence  is  not  stored  in  the  table. 
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approaches  both  take  linear  time  (over  practical  ranges  of  a),9  whereas  chart  parsing  reunites  cubic  time  to 
parse  sentences  in  these  grammars,  a  significant  improvement. 

So  far  we  have  shown  how  to  compute  in  linear  time  the  number  of  ambiguous  interpretations  of  a 
sentences  in  an  “every  way  ambiguous”  grammar.  However,  we  arc  really  interested  in  finding  parse  trees, 
not  just  the  number  of  ambiguous  interpretations.  We  could  extend  the  table  lookup  algorithm  to  find  trees 
rather  than  ambiguity  coefficients,  by  modifying  die  table  to  store  trees  instead  of  numbers.  For  parsing 
purposes.  Cat .  can  be  thought  of  as  a  pointer  to  the  ilh  entry  of  the  table.  So,  for  a  sentence  in  grammar  (6a) 
for  example,  the  machine  could  count  the  number  of  occurrences  of  “and  John”  and  then  retrieve  the  table 
entry  for  that  number. 

index  trees 

0  {[John)} 

1  {(John  and  John]} 

2  {[(John  and  John]  and  John],  [John  and  [John  and  John])} 


Hie  table  would  be  more  general  if  it  did  not  specify  the  lexical  items  at  the  leaves.  Let  us  replace  the  table 
above  with 

index  trees 

0  {[x]} 

1  {[xx]} 

2  {[[x  x]  x],  [x  [x  x]]} 


and  assume  the  machine  can  bind  the  x's  to  the  appropriate  lexical  items. 

There  is  a  real  problem  with  this  table  lookup  machine.  Hie  parse  trees  may  not  be  exactly  correct 
because  the  power  series  computation  assumed  tliat  multiplication  was  associative,  which  is  an  appropriate 
assumption  for  counting  ambiguity,  but  inappropriate  for  constructing  trees.  For  example,  we  observed  that 
prepositional  phrases  and  conjunction  arc  both  "every  way  ambiguous"  grammars  because  their  ambiguity 
coefficients  are  Catalan  numbers.  However,  it  is  not  the  ease  that  they  generate  exactly  the  same  parse  trees. 


9.  The  linear  lime  result  depends  on  the  assumption  that  table  lookup  (or  closed  form  computation)  can  be  performed  in  constant  time. 
This  may  be  a  fair  assumption  over  practical  tangos  of  n ,  but  it  is  not  tnic  in  general. 
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Nevertheless  we  present  the  table  lookup  pseudo-parser  here  because  it  seems  to  be  a  speculative  new 
approach  with  considerable  promise.  It  is  often  more  efficient  than  a  real  parser,  and  die  irccs  that  it  finds 
may  be  just  as  useful  as  the  correct  one  for  many  practical  purposes.  For  example,  many  speech  recognition 
projects  employ  a  parser  to  filter  out  syntactically  inappropriate  hypotheses.  However,  a  full  parser  is  not 
really  necessary  for  this  task;  a  recognizer  such  this  table  lookup  pseudo-parser  may  be  perfectly  adequate  for 
(his  task. 

Furthermore,  it  is  often  possible  to  recover  the  correct  trees  from  the  output  of  the  pseudo-parser..  In 
particular,  the  difference  between  prepositional  phrases  and  conjunction  could  be  accounted  for  bv  modifying 
die  interpretation  of  the  PP  category  label,  so  that  the  trees  would  be  interpreted  correctly  even  though  they 
arc  not  exaedy  correct.  In  short,  die  table  lookup  pseudo-parser  is  worth  exploring  even  diougl,  die  results 
arc  not  always  correct.  The  results  arc  close  enough  for  many  applications  (c.g.,  speech  recognition)  and  die 
mistakes  can  often  be  corrected. 

The  table  lookup  approach  works  for  primitive  grammars.  The  next  two  sections  will  show  how  to 
decompose  composite  grammars  into  series  and  parallel  combinarions  of  primitive  grammars. 

(13a)  G  =  Gj-G2 

(13b)  G  =  Gj  +  G2 

5.  Parallel  Decomposition 

Parallel  decomposition  can  be  very  useful  for  dealing  with  lexical  ambiguity,  as  in 

(14)  ...  to  total  with  products  near  profits  ... 

where  "total"  can  be  taken  as  a  noun  or  as  a  verb,  as  in: 

(15a)  The  accountant  brought  the  daily  sales  to  total  with  products  near  profits  organized  according  to 
the  new  law.  noun 

(15b)  The  daily  sales  were  ready  for  die  accountant  to  total  with  products  near  profits  organized 
according  to  the  new  law.  verb 

The  analysis  of  these  sentences  will  make  use  of  the  additivity  property  of  linear  systems.  That  is,  each 
case,  (15a)  and  (15b),  will  be  treated  separately,  and  then  the  results  will  be  added  together.  Assuming  “total" 
is  a  noun,  there  arc  three  prepositional  phrases  contributing  Cat  3  bracketings,  and  assuming  it  is  a  verb,  there 
arc  two  prepositional  phrases  for  Cat  2  ambiguities.  Combining  the  two  eases  produces  Cat  ?+Cat2  =  5  +  2 
7  parses.  Adding  another  prepositional  phrase  yields  Cat 4  + Cat  3  =  14  +  5  =  19  ambiguities.  (l-QSi* 


series 

parallel 
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behaved  as  predicted  in  both  cases.) 

ITiis  behavior  is  generalized  by  the  following  power  series: 

06)  {tP0Nv}  2(Cat,  +  l  f  Cat^fPN)1 

i 

w  hich  is  die  sum  of  die  two  eases: 

(17a)  2  Cat :  (P  N)1  =  P  N  2  Cat  j + j  (p  N)’  noun 

;>0  i 

(17b)  to  V  2  Cat  j  (P  N)1  verb 

i 

This  observation  can  be  incorporated  into  die  table  lookup  pseudo-parser  outlined  above.  Recall  that 
Cat ;  is  interpreted  as  the  ilh  index  in  a  table  containing  all  binary  trees  dominating  i  leaves.  Similarly,  Catj  + 
Cat .  +  j  will  be  interpreted  as  an  instruction  to  “append"  die  ilh  entry  and  i  +  1st  entry  of  the  table. 

(18)  (ADD-TREKS  (CAT- TABLE  i)  (CAT-TABLE  (+  i  1))) 

(This  can  be  implemented  efficiently,  given  an  appropriate  representadon  of  sets  of  trees.) 

Now  suppose  there  were  an  oracle  that  disambiguated  die  word  “total”.  How  could  we  incorporate  this 
information  once  we  have  already  parsed  the  input  sentence  and  found  that  it  was  the  sum  of  two  Catalans? 
The  parser  can  simply  subtract  out  the  inappropriate  interpretations.  If  die  oracle  says  that  “total”  is  a  verb, 
then  (17a)  would  be  subtracted  from  the  combined  sum,  and  if  the  oracle  says  that  “total”  is  a  noun,  dicn 
(17b)  would  be  subtracted. 

Furthermore,  suppose  that  we  wanted  to  evaluate  the  usefulness  of  a  particular  oracle.  For  example, 
suppose  that  there  was  a  semande  routine  diat  could  disambiguate  “total”,  but  this  semantic  routine  is  very 
expensive  to  execute  so  that  we  don't  want  to  run  it  unless  we  are  very  sure  diat  it  has  a  desirable  cost/benefit 
rado.  We  need  a  way  to  estimate  the  usefulness  of  die  semantic  routine  so  diat  we  don’t  waste  time  working 
on  semande  constraints  when  they  won’t  help  very  much.  This  analysis  provides  a  very  simple  way  to 
esdmate  the  benefit  of  disambiguadng  "total”.  If  it  turns  out  to  be  a  verb,  dicn  (17a)  trees  have  been  ruled 
out,  and  it  it  turns  out  to  be  a  noun,  then  (17b)  trees  have  been  ruled  out. 
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6.  Series  Decomposition 

Suppose  we  have  a  non-terminal  S  which  is  a  series  combination  of  two  other  non  terminals,  M>  and  vp. 
Ity  inspection,  die  power  scries  of  S  is: 

(19)  S  =  NP  •  VP 

I  his  result  is  easily  verified  when  there  is  an  unmistakable  dividing  point  between  the  subject  and  the 
predicate.  For  example,  the  verb  “is”  separates  the  pps  in  the  subject  from  those  in  the  predicate  in  (20a),  but 
not  in  (20b). 

(20a)  Hie  number  of  products  over  sales  of ...  is  near  (lie  number  of  sales  under ...  dearly  divided 
(20b)  Is  the  number  of  products  over  sales  of ...  near  die  number  of  sales  under ...?  not  dearly  divided 

In  (20a),  the  total  number  of  parse  trees  is  the  product  of  die  number  of  ways  of  parsing  the  subject  times  die 
number  of  ways  of  parsing  the  predicate.  Both  the  subject  and  die  predicate  produce  a  Catalan  number  of 
parses,  and  hence  die  result  is  the  product  of  two  Catalan  numbers,  which  was  verified  by  KQSP  (11:  p.  53). 
This  result  can  be  formalized  in  terms  of  the  power  series: 

(21)  (  N  2  Cat  j  (P  N)'  )  (  is  2  Catj  (p  N)*  ) 

*  j 

which  is  formed  by  taking  the  product  of  the  two  subcases: 

(22a)  N  2  Cat  j  (P  N)'  subject 

i 

(22b)  is  2  Cat. (P predicate 
j 

The  power  series  says  that  the  ambiguity  of  a  particular  sentence  is  the  product  of  Cat .  and  Cat.,  where 
i  is  the  number  of  PPs  before  “is"  and  j  is  the  number  after  “is”.  This  could  be  incorporated  in  die  table 
lookup  parser  as  an  instruction  to  “multiply”  the  i,h  entry  in  the  table  times  the  j111  entry.  Multiplication  is  a 
cross-product  operation;  l.x  R  generates  the  set  of  binary  trees  whose  left  sub-tree  1  is  from  I*  and  whose 
right  sub-tree  i  is  from  E 


(23)  LX  R  =  {(U)|ieL&r€R} 


Series  Decomposition 


■  17  ■ 


Section  6 


This  is  a  formal  definition.  For  practical  purposes,  it  may  be  more  useful  for  the  parser  to  output  the  list  in 
the  factored  form: 

(24)  (MULTIPLY -TRI-PS  (CAT  l  ABIT.  i)  (CAT-TAIILH  j)) 

which  is  much  more  concise  than  a  list  of  trees.  It  is  possible,  for  example,  that  semantic  processing  can  take 
advantage  of  factoring,  capturing  a  semantic  generalization  that  holds  across  all  subjects  or  all  predicates. 
Imagine,  for  example,  that  there  is  a  semantic  agreement  constraint  between  predicates  and  arguments.  For 
example,  subjects  and  predicates  might  have  to  agree  on  the  feature  ±human.  Suppose  that  we  were  given 
sentences  where  this  constraint  was  violated  by  all  ambiguous  interpretations  of  the  sentence.  In  this  case,  it 
would  be  more  efficient  to  employ  a  feature  vector  scheme  [3]  which  propagates  the  features  in  fauored  form. 
That  is,  it  computes  a  feature  vector  for  the  union  of  all  possible  subjects,  and  a  vector  for  die  union  of  all 
possible  VPs,  and  then  compares  (intersects)  these  vectors  to  check  if  there  arc  any  interpretations  which  meet 
the  constraint.  A  system  such  as  this,  which  keeps  the  parses  in  factored  form,  is  much  more  efficient  than 
one  that  multiplies  them  out.  liven  if  semantics  cannot  take  advantage  of  the  factoring,  there  is  no  harm  in 
keeping  the  representation  in  factored  form,  because  it  is  straightforward  to  expand  (24)  into  a  list  of  trees 
(though  it  may  be  somewhat  slow). 

This  example  is  relatively  simple  because  “is"  helps  the  parser  determine  the  value  of  /  and  j.  Now  let 
us  return  to  example  (20b)  where  “is”  docs  not  separate  the  two  strings  of  pps.  Again,  we  determine  the 
power  series  by  multiplying  the  two  subcases: 

(25)  is  (  N  2  Cat  t  (P  N)j  )  (  2  Cat .  (pn)>)  =  is  N  2  2  Cat .  Cat  j  (P  N)i+> 

i  j  i  j 

However  this  form  is  not  so  useful  for  parsing  because  the  parser  cannot  easily  determine  i  and  j,  the 
number  of  prepositional  phrases  in  the  subject  and  the  number  in  the  predicate.  It  appears  the  parser  will 
have  to  compute  the  product  of  two  Catalans  for  each  way  of  picking  i  and  j,  which  is  somewhat  expensive.10 
Fortunately  the  Catalan  function  has  some  special  properties  so  that  it  is  possible  algebraically  to  remove  the 
references  to  i  and  j.  In  the  next  section  we  will  show  how  this  expression  can  be  reformulated  in  terms  of  n, 
die  total  number  of  pps. 


10.  Earley's  algorithm  and  most  other  context-free  parsing  algorithms  actually  work  this  way. 
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6. 1  Auto-Convolution  of  Catalan  Grammars 

Some  readers  may  have  noticed  that  expression  (25)  is  in  convolution  form.  We  will  make  use  of  this  in 
the  reformulation.  Notice  that  the  Catalan  series  is  a  fixed  point  under  auto- convolution  (except  for  a  shift); 
that  is.  multiplying  a  Catalan  power  series  (i.c.,  1  +  x  +  2x2  +  5x3  +  14x4  +  ...CatjX1 ...)  with  itself 
produces  another  polynomial  with  Catalan  coefficients.1 1  Hie  multiplication  is  worked  out  below  for  the  first 
few  terms. 

1  +  x  +  2x2  +  5x3  4-  14x4  +  ... 

X  1  +  x  +  2x2  +  5x3  +  14x4  +  ... 

1  +  x  +  2x2  +  5x3  +  14x4  +  ... 

x  +  x2  +  2x3  +  5x4  +  ... 

2x2  +  2x3  +  4x4  +  ... 

5x3  +  5x4  +  ... 

+  14x4  +  ... 

1  +  2x  +  5x2  +  14x3  +  42x4  +  ... 

This  property  can  be  summarized  as: 

(26)  2  Cat  ( x1  2  Catj  x^  =  2  Catn+1  xn 

i  j  n 

where  n  equals  i  +  j. 

Intuitively,  this  equation  says  that  if  we  have  two  “every  way  ambiguous”  (Catalan)  constructions,  and 
we  combine  them  in  every  possible  way  (convolution),  die  result  is  an  “every  way  ambiguous”  (Catalan) 
construction.  With  this  observation,  equation  (25)  reduces  to: 

(27)  is  (  N  2  Cat  j  (P  N)1  )  (  2  Catj  (pn))  =  is  n  2  Catn+1  (P  N)n 

'  j  n 

Hence  tire  number  of  parses  in  the  auxiliary-inverted  case  is  the  Catalan  of  one  more  than  in  the  non-inverted 
cases.  As  predicted,  RQSP  found  the  following  inverted  sentences  to  be  more  ambiguous  than  their  non- 
inverted  counter-parts  (previously  discussed  on  page  12)  by  one  Catalan  number. 

1 !  The  proof  immediately  follows  from  the  z-transfonn  of  the  Catalan  series  [R:  p.  388):  ?B(/)2  =  IVz)  -  1. 
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1  Was  the  number? 

2  Was  the  number  of  products? 

5  Was  the  number  of  produces  of  produces? 

14  Was  the  number  of  products  of  produces  of  products? 

42  Was  the  number  of  products  of  produces  of  products  of  products? 

1  It  was  the  number. 

1  It  was  die  number  of  products. 

2  It  was  die  number  of  products  of  products. 

5  It  was  die  number  of  products  of  products  of  products. 

14  It  was  die  number  of  products  of  products  of  products  of  products. 

How  could  this  result  be  incorporated  into  die  table  lookup  pseudo-pa rscr?  Recall  diat  the  pseudo- 
parser  implements  Catalan  grammars  by  returning  an  index  into  die  Catalan  table.  For  example,  if  dicrc  were 
i  pps,  the  parser  would  return:  (CAT-TABLE  i).  We  now  extend  the  indexing  scheme  so  that  the  parser 
implements  a  series  connection  of  two  Catalan  grammars  by  returning  one  higher  index  dian  it  would  for  a 
simple  Catalan  grammar.  That  is,  if  there  were  n  Bis,  the  parser  would  return:  (cat-table  (+  n  1)). 

Series  connections  of  Catalan  grammars  arc  very  common  in  every  day  natural  language,  as  illustrated 
by  the  following  two  sentences  which  have  received  considerable  attention  in  the  literature  because  die  parser 
cannot  separate  the  direct  object  from  the  prepositional  complement 

(28a)  I  saw  the  man  on  the  hill  with  a  telescope ... 

(28b)  Put  the  block  in  the  box  on  the  table  in  the  kitchen ... 

Both  examples  have  a  Catalan  number  of  ambiguities  because  the  auto-convolution  of  a  Catalan  series  yields 
*  another  Catalan  series.12  This  result  can  improve  parsing  performance  because  it  suggests  ways  to  re-organize 
(compile)  the  grammar  so  that  there  will  be  fewer  references  to  quantities  that  are  not  readily  available.  This 
re-organization  will  reap  benefits  diat  chart  parsers  (e.g.  Earley’s  algorithm)  do  not  currently  achieve  because 
the  re-organization  is  taking  advantage  of  a  number  of  combinatoric  regularities,  especially  convolution ,  that 
are  not  easily  encoded  into  a  chart.  Section  9  will  present  an  example  of  the  re-organization. 


12.  There  is  a  difference  between  these  two  sentences  because  "pul"  subcategorizes  for  two  objects  unlike  "see".  Suppose  we  analyze 
"see"  as  lexically  ambiguous  between  two  senses,  one  which  selects  for  exacUv  two  objects  like  "put"  and  one  which  selects  for  exactly 
one  object  as  in  "I  saw  it.”  The  first  sense  contribu'es  the  same  number  of  parses  as  "put"  and  the  second  sense  contributes  an  additional 
Catalan  factor. 
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6.2  Chart  Parsing 


Perhaps  it  is  worthwhile  to  reformulate  chart  parsing  in  our  terms  in  order  to  show  wliidi  of  the  above 
results  can  be  captured  by  such  an  approach  and  which  cannot.  Traditionally  chart  parsers  maintain  a  chart 
(or  matrix)  M,  whose  entries  VI y  contain  die  set  of  category  labels  which  span  from  position  i  to  position  j  in 
die  input  sentence.  This  is  accomplished  by  finding  a  position  k  in  between  i  and  j  such  that  dtere  is  a  phrase 
from  i  to  k  which  can  combine  with  another  phrase  from  k  to  j.  An  implementation  of  die  inner  loop  looks 
something  like: 


(29) 


Mr=U 

loop  for  k  from  i  to  j  do 


My  :=  MyU  Mjk*M 


kj 


essentially,  dicn,  a  chart  parser  is  maintaining  the  invariant 


(30)  My  =  2  Mlk  •  Mkj 

k 

Recall  that  addition  and  multiplication  were  previously  defined  over  polynomials.  We  can  preserve  these 
definitions  if  we  modify  the  contents  of  the  chart.  Let  us  replace  the  set  of  category  labels  in  My  with  a  set  of 
factored  polynomials.  That  is,  let  My  denote  the  polynomial  describing  the  ways  to  parse  a  phrase  of  category 
x  from  position  i  to  position  j.  For  example,  the  notation 

(31)  + 

indicates  that  there  are  two  ways  to  combine  an  NP  and  a  VP  to  form  an  S  from  position  0  to  position  4. 

This  formulation  of  the  chart  can  be  compared  with  serial  and  parallel  decomposition.  Note  dial 
M^,p  •  M^J  is  essentially  the  same  as  (multiply-trfhs  M^p  M^5).  Similarly,  adding  matrix  elements 
corresponds  to  ADD-TREES.  Hence,  chart  parsing  is  more  similar  to  serial  and  parallel  combinations  dian  one 
might  have  suspected.  When  the  grammar  is  factored  appropriately,  chart  parsers  will  be  able  to  take 
advantage  of  serial  and  parallel  decompositions  discussed  above. 

However,  the  examples  above  illustrate  cases  where  chart  parsers  are  inefficient.  In  particular,  chart 
parsers  cannot  take  advantage  of  convolution  and  the  “every  way  ambiguous"  gencrali/atioi.  That  is, 
Harley’s  algoridim  performs  convolution  the  “long  way",  by  picking  each  possible  dividing  point  k.  and 
parsing  from  i  to  k  and  from  k  to  j.  It  is  incapable  of  reducing  the  convolution  of  two  Catalan  as  we  did 
above.  Similarly,  Parley's  algorithm  is  incapable  of  using  the  "every  way  ambiguous”  generalization.  That  is. 
it  requires  0(n3)  time  to  parse  Catalan  grammars  because  there  are  no  constraints  on  die  choice  of  i.  j  and  k. 
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I  lie  algorithm  will  eventually  enumerate  all  possible  values  of  i,  j  and  k.  We  suggest  that  a  processor  ought  to 
be  able  to  notice  the  lack  of  constraints,  and  thus  avoid  enumerating  the  space  as  Harley's  algorithm  does. 

Finally,  in  passing,  we  have  one  constructive  suggestion  for  chart  parsers.  We  observe  that  it  is  possible 
to  count  die  number  of  ambiguous  interpretations  in  0(n3)  time.  This  is  an  improvement  over  the  obvious 
algorithm  which  multiplies  out  all  the  trees  just  as  if  they  were  being  printed.  (Such  an  exponential  algorithm 
was  actually  implemented  in  r.QSt\)  We  suggest  keeping  a  second  matrix  A,  where  A"1,  holds  the  number  of 
ways  of  deriving  a  phrase  of  category  x  between  i  and  j.  The  two  matrices,  A  and  M.  arc  almost  identical, 
except  that  A  holds  integers  and  M  holds  polynomials.  Accordingly,  addition  and  multiplication  arc  defined 
slightly  differently  on  the  two  matrices.  In  A,  they  map  integers  into  integers  in  the  obvious  way;  in  M,  they 
map  polynomials  into  polynomials  as  discussed  above.  Note  that  both  matrices,  A  and  M,  can  be  computed 
with  exactly  the  same  sequences  of  multiplications  and  additions.  Hence,  it  is  possible  to  compute  the 
number  of  ambiguous  interpretations  in  cubic  time. 

6.3  Auto*Convo)ution  of  Unit  Step  Grammars 

let  us  return  to  the  discussion  of  convolution.  This  section  will  illustrate  a  second  practical  example  of 
convolution.  Consider  the  following  grammar  (“A”  denotes  the  empty  string):13 

(32)  A  -*  a  A  |  A 

We  call  this  grammar  a  unit  step  grammar  because  alt  of  its  ambiguity  coefficients  arc  1. 

(33)  A  =  1  +  a  +  a2  +  a3  +  a4  +  a5  +  ...  -  T.  a" 

n 

In  other  words,  the  grammar  is  unambiguous.34  Fmbedded  sentences  arc  a  typical  example  of  (32)  in  Hnglish. 

(34)  I  believe  you  said  he  thought  you  were ... 

Sunposc  for  the  sake  of  discussion  that  we  choose  to  analyze  adjuncts  with  a  right  branching  grammar.  (By 
convention,  terminal  symbols  appear  in  lower  case.) 

(35)  ADJS  -»  adj  ADJS  (  A 


13.  Note  that  the  empty  language  {  }  is  distinct  from  the  language  of  the  empty  string  (A).  In  particular,  {A}  is  the  identity  element 
under  series  connection  and  {  }  is  the  identity  element  under  parallel  connection.  Ihus,  (A)  is  modeled  as  1  in  the  power  series 
representation,  whereas  ( }  is  modeled  as  0. 

14.  Unit  step  grammars  are  not  exactly  the  same  as  unambiguous  grammars.  The  ambiguity  coefficients  of  a  unit  step  grammar  arc  all  I, 
whereas  the  ambiguity  coefficients  of  an  unambiguous  grammar  arc  either  1  or  0. 
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so  that 

(36)  Will  you  go  to  the  store  tomorrow  in  the  morning  about  10:00  alter ...? 

has  one  parse,  independent  of  the  number  of  adjuncts.  A  similar  analysis  of  adjuncts  is  adopted  in  [7J.  This 
analysis  can  also  be  defended  on  performance  grounds  as  an  efficiency  approximation.  (  This  approximation 
is  in  the  spirit  of  pseudo-attachment  [11.) 

The  power  scries  is 

(37)  adjs  =  2 

i 

Now,  how  many  ambiguities  will  there  be  if  we  add  a  second  clause  to  (36)  as  in: 

(38)  1  will  ask  if  you  will  go  to  the  store  tomorrow  in  the  morning  about  10:00  after ...? 

Some  of  the  adjuncts  will  attach  to  “go”  and  the  rest  will  attach  to  “ask”.  The  number  of  parses  is  determined 
by  multiplying  the  two  subgrammars. 


(39)  ADJS  •  ADJS  =  2  adj'  2  adji  =  2  2  adj’+j 

i  j  i  j 

This  equation  has  the  same  problem  as  equation  (25);  because  there  is  no  clear  dividing  line  between 
the  adjuncts  that  attach  to  “go”  and  the  ones  that  attach  to  "ask",  it  is  not  very  easy  for  the  parser  to  determine 
i  and  j.  Again,  it  might  appear  that  the  parser  will  have  to  try  all  possible  values  of  i  and  j,  a  moderately 
expensive  process.  However,  there  arc  some  special  properties  of  the  step  function  that  enable  us  to  remove 
the  references  to  i  and  j  in  equation  (39).  In  engineering  jargon,  the  convolution  of  two  steps  is  a  ramp.  Ihat 
is,  the  product  of  two  polynomials  with  step  coefficients  is  a  polynomial  with  increasing  coefficients  [8:  pp.  89. 
equation  16].  We  have  multiplied  out  the  first  few  terms  below. 

1  +  x  +  x2  +  x3  +  x4  +  ... 

X  1  +  x  +  x2  +  x3  +  x4  +  ... 


1  +  2x  +  3x2  +  4x3  +  5x4  +  ... 
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The  general  result  is: 

(40)  2  x‘  2  =  2  (n  +  l)xn 

»  j  n 

Now  equation  (39)  can  be  simplified  so  that  the  references  to  i  and  j  arc  replaced  with  n,  the  total  number  of 
adjuncts.  Ibis  is  much  easier  for  the  parser  to  deal  with  because  for  a  given  input  sentence  there  is  a  single 
value  for  n,  whereas  there  arc  multiple  values  for  i  and  j. 

(41)  2  adi'  2  W  =  2  (n  +  1 )  adjn 

i  j  n 

Ibis  says  that  a  string  of  n  adjuncts  induces  n+l  parse  trees,  because  there  are  n+  /  ways  to  cut  the  string  into 
two  substrings.15  Now  suppose  there  were  three  matrix  clauses  instead  of  just  two. 

(42)  I  will  ask  if  he  will  persuade  you  to  go  to  the  store  tomorrow  in  the  morning  about  10:00  after ...? 
The  number  of  parses  in  this  case  is  the  convolution  of  three  steps. 

(43)  2  adj'  21  adJ*  21 

i  j  k 

Again  this  form  is  ill-suited  for  parsing  because  there  is  no  easy  way  to  determine  4  j  and  k.  However,  it  is 
possible  to  remove  the  references  to  the  offending  variables  by  taking  advantage  of  some  special  properties  of 
the  step  function.  In  particular,  there  is  a  closed  form  for  the  convolution  of  d+ 1  step  functions  (8:  p.  90, 
equation  20]: 

(44)  (  2  x'  )d+1  =  21  (  "dd  )  x" 

i  n 

Now  we  can  remove  the  references  to  4  j  and  k: 

(45)  (  2  W  )3  =  21  (  °2  2  )  adj"  =  21  j(n+I)(n+2)adjn 


15.  The  string  can  be  cut  between  any  two  words  (n-  1  places)  or  at  either  end  (2  places). 


Auto-Convolution  of  Unit  Step  Grammars 


-24- 


Section  6J 


These  examples  show  that  standard  well-known  combinatorics  can  be  used  to  determine  the  number  of 
ambiguities  in  many  common  eases. 

7.  Computing  the  Power  Series  Directly  from  the  Grammar 

In  fact,  the  result  derived  in  the  previous  section  can  be  computed  directly  from  the  grammar  itself. 
First  we  translate  the  grammar  into  an  equation  in  the  usual  way.  That  is,  adjs  is  modeled  as  a  parallel 
combination  of  two  subgrammars,  adj  adjs  and  A.  (Recall  that  A  is  modeled  as  l  because  it  is  the  identity 
element  under  series  combination.) 

(46a)  ADJS  -»  adj  AIMS  |  A 
(46b)  ADJS  =  adj  ■  ADJS  +  1 

We  can  simplify  (46b)  so  the  right  hand  side  is  expressed  in  terminal  symbols  alone,  with  no  references  to 
non  terminals.  ITiis  is  very  useful  for  processing  because  it  is  much  easier  for  the  parser  to  determine  the 
presence  or  absence  of  terminals,  than  of  non-terminals.  That  is,  it  is  easier  for  die  parser  to  determine,  for 
example,  whether  a  word  is  an  adj,  titan  it  is  to  decide  whether  a  substring  is  an  adjs  phrase.  ITtc 
simplification  moves  all  references  to  ADJS  to  the  left  hand  side,  by  subtracting  from  both  sides, 

(46c)  ADJS  -  adj  •  ADJS  =  1 

factoring  the  left  hand  side, 

(46d)  (1  -  adj)  ADJS  =  1 

and  dividing  from  both  sides. 

(46c)  ADJS  =  (1  -  adj)"1 

This  result  is  equivalent  to  the  step  formulation  (37),  as  can  been  seen  by  performing  the  long  division: 
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The  purpose  of  this  section  was  two  folded.  First,  we  presented  a  simpler  derivation  of  the  power  series 
far  a  unit  step  grammar.  Secondly,  and  more  importantly,  we  have  introduced  the  notion  of  division.  We 
now  have  four  combination  rules: 

(47a)  series  combination  (multiplication) 

(47b)  parallel  combination  (addition) 

(47c)  inverse  of  series  combination  (division) 

(47d)  inverse  of  parallel  combination  (subtraction) 

Series  and  parallel  combinations  are  frequently  found  in  many  grammars  formalisms  currently  employed  in 
the  literature  (c.g.  context-free  grammars,  atns),  and  consequently,  they  required  very  little  motivation. 
Subtraction  was  introduced  as  a  “ruling-out"  operation.  The  next  section  will  provide  an  intuition  for 
division  in  terms  of  atns. 

8.  Computing  the  Power  Series  from  the  ATN 

This  section  will  rc-dcrive  the  power  scries  for  the  unit  step  grammar  directly  from  the  ATN 
representation  by  treating  the  networks  as  flow  graphs  [13].  Die  graph  transformations  presented  here  are 
directly  analogous  to  the  algebraic  simplifications  employed  in  the  previous  section. 

First  we  translate  the  grammar  into  an  ATN  in  the  usual  way  [16], 


(48)  ADJS  -♦  adj  ADjS  |  A 


(49)  ADJS: 


Cat  adj 


O 


Push  ADJ 


Jump 


P°P 


This  graph  can  be  simplified  by  performing  a  compiler  optimization  called  tail  recursion  ([2]  and  references 
therein).  This  transformation  replaces  the  final  push  arc  with  a  jump: 


Jump 
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Tail  recursion  corresponds  directly  to  the  algebraic  operations  of  moving  the  ADJS  terms  to  left  hand  side, 
factoring  out  the  ADJS,  and  dividing  from  both  sides. 

Then  we  remove  die  top  jump  arc  by  series  reduction.  'ITiis  step  corresponds  to  multiplying  by  1  since  a 
jump  arc  is  the  atn  representation  for  the  identity  element  under  scries  combination. 


(51)  ADJS: 


The  loop  can  be  treated  as  an  infinite  scries: 

(52)  1  +  adj  +  adj^  +  adj^  +  ... 

where  die  zero-th  term  corresponds  to  zero  iterations  around  the  loop,  the  first  term  corresponds  to  a  single 
iteration,  the  second  term  to  two  iterations,  and  so  on.  Recall  that  (52)  is  equivalent  to: 

<53>  T^5 

With  this  observation,  it  is  possible  to  open  the  loop: 


(54) 


ADJS:  Q. 


l/(l-adj) 


Jump 


After  one  final  series  reduction,  the  atn  is  equivalent  to  expression  (46c)  above. 


(54c)  ADJS:  O - - >0 — 

Now  we  can  motivate  division  in  intuitive  terms.  Division  is  a  loop  in  an  ATN. 

How  can  division  be  implemented?  We  have  two  answers.  First,  division  can  be  implemented  as  an 
atn  loop.  Alternatively,  we  can  employ  the  table  lookup  scheme  discussed  above.  That  is,  we  formulate 
division  as  an  infinite  sum: 

i — ^ —  =  2  W 

1  -  idj  ‘ 


(55) 
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I  hen  we  constaict  a  tabic  such  that  the  i111  entry  contains  the  ilh  ambiguity  coefficient.  In  other  w  unis.  the  i,h 
location  in  the  table  tells  the  parser  how  to  parse  i  occurrences  of  adj.  The  table  lookup  scheme  is  somewhat 
more  general  than  the  atn  loop,  because  the  table  allows  the  i^  coefficient  to  take  on  arbitrary  values  whereas 
the  atn  loop  restricts  the  coefficients  to  1.  For  example,  the  Catalan  grammar  (56a)  could  be  implemented 
w  ith  a  table  (56b),  but  not  with  an  AIN  loop. 

(56a)  A  — *  A  A  |  a  Catalan  Grammar 

(56b)  2  Cat  4  a'  table  implementation 

i 

However  the  table  has  the  theoretical  problem  that  it  requires  an  infinite  amount  of  memory.  This  is  not  a 
problem  in  practice  since  the  regions  of  interest  arc  not  that  large.  It  is  unlikely,  for  example,  that  a  sentence 
would  contain  more  than  twenty  prepositional  phrases. 

So  far  we  have  discussed  five  primitive  grammars:  Catalan,  Unit  Step.  1,  and  0.  and  terminals,  and  four 
composition  rules:  addition,  subtraction,  multiplication  and  division.  Furthermore  we  have  outlined  three 
implementation  strategics:  successive  approximation  (chart  parsing),  table  lookup,  and  atns.  We  have  seen 
that  it  is  often  possible  to  employ  these  tools  in  order  to  rc-organi/e  the  grammar  so  that  these 
implementations  will  perform  more  efficiently.  We  have  identified  certain  situations  where  the  ambiguity  is 
combinatoric,  and  have  sketched  a  few  modifications  to  the  grammar  that  enables  processing  to  proceed  in  a 
more  efficient  manner.  In  particular,  we  have  observed  it  is  important  for  the  grammar  to  avoid  referencing 
quantities  that  arc  not  easily  determined  such  as  the  dividing  point  between  a  noun  phrase  and  a  prepositional 
phrase. 

9.  An  Example 

Suppose  for  example  that  we  were  given  the  following  grammar: 

(57a)  s  — ♦  np  vp  adjs 
(57b)  S  -»  V  NP  (PP)  ADJS  ADJS 
(57c)  VP  -♦  V  NP  (PP)  ADJS 
(57d)  PP-+PNP 
(57e)  NP  — ►  N  (PP) 

(570  ADJS  — ♦  adj  ADJS  j  A 

(In  this  example,  we  will  assume  no  lexical  ambiguity  among  N,  v,  p  and  adj.) 
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By  inspection,  wc  notice  tliat  NP  and  PP  arc  Catalan  grammars  and  that  adjs  is  a  Step  grammar. 


(58a)  PP  =  2  CatjtPN)* 
i>0 

(58b)  NP  =  N  2  Catj(p  N)‘ 
i 

(58c)  ADJS  =  2  adj' 
i 

With  these  observations,  the  parser  can  process  pps,  NPs  and  adjs  by  counting  the  number  of  occurrences  of 
terminal  symbols  and  looking  up  those  numbers  in  tire  appropriate  tables.  Wc  now  substitute  (58a-c)  into 
(57c) 


(59)  VP  =  VNP(1  +  PP)  ADJS 


=  V  (  N  2  Catj(PN)'  )(2  Cat .  (P  N)'  )(2  adj'  ) 


and  simplify  the  convolution  of  the  two  Catalan  functions 


(60)  vp  =  v(n  2cati+i(PN)i  )  (  2*1)0 

i  i 

so  that  the  parser  can  also  find  VPs  by  just  counting  occurrences  of  terminals  symbols.  Now  we  simplify 
(57a-b)  so  that  s  phrases  can  also  be  parsed  by  just  counting  occurrences  of  terminal  symbols.  First,  translate 
*  (57a-b)  into  the  equation: 

(61)  S  =  NP  VP  ADJS  +  V  NP  (1  +  PP)  ADJS  ADJS 
and  then  expand  VP 

(62)  S  =  NP  (V  NP  (1  +  PP)  ADJS)  ADJS  +  V  NP  (1  +  PP)  ADJS  ADJS 
and  factor 

(63)  S  =  (NP  +  1)  V  NP  (1  +  PP)  ADJS2 


This  can  be  simplified  considerably  because 
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(64)  NP(1  +PP)  =  N  2  Cat 4  (p  N)'  2  Catj(PN)1  =  N  2  Catj  +  J  (PN)‘ 


and 


(65)  ADJS2  =  2  adi‘  2  adi'  =  2  (•+ 1)  adj' 


so  that 


(66)  S  =  (  N  2  Catj  (P  N)'  +  1  )  V  N  2  Cati  +  i  (p  N)’  2  0  +  1)  adj' 

i  i  i 

which  has  the  following  atn  realization: 


The  entire  example  grammar  has  now  been  compiled  into  a  form  that  is  easier  for  parsing.  This  formula  says 
that  sentences  arc  all  of  the  form: 

(68)  S  A  (N  (P  N)*)  V  N  (P  N)*  adj* 

which  could  be  recognized  by  the  following  finite  state  machine: 


Furthermore,  the  number  of  parse  trees  for  a  given  input  sentence  can  be  found  by  multiplying  three 
numbers:  (a)  the  Catalan  of  the  number  of  P  n’s  before  the  verb,  (b)  the  Catalan  of  one  more  than  the  number 
of  P  n’s  alter  the  verb,  and  (c)  die  ramp  of  the  number  of  adj’s.  For  example,  the  sentence 
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(70)  The  man  on  the  hill  saw  the  boy  with  a  telescope  yesterday  in  the  morning. 

has  Cat  j  *  Cat  2  *  3  =  6  parses.  That  is,  there  is  one  way  to  parse  “the  man  on  the  hill”,  two  ways  to  parse 
"saw  the  boy  with  a  telescope”  (either  “telescope"  is  a  complement  of  “see”  as  in  (71a-c)  or  it  is  attached  to 
“boy”  as  in  (71d-f)),  and  three  ways  to  parse  the  adjuncts  (they  could  both  attach  to  die  S  (7!a,d),  or  they 
could  both  attach  to  the  VP  (71b,e),  or  they  could  split  (71c,f)). 

(71a)  [The  man  on  the  hill  [saw  the  boy  with  a  telescope]  [yesterday  in  die  morning.]] 

(71b)  The  man  on  die  hill  [[saw  the  boy  with  a  telescope]  [yesterday  in  the  morning.]] 

(7  lc)  The  man  on  the  hill  [[saw  the  boy  with  a  telescope]  yesterday]  in  the  morning. 

(71d)  flhe  man  on  the  hill  saw  [the  boy  widi  a  telescope]  [yesterday  in  die  morning.]] 

(71e)  The  man  on  the  hill  [saw  [the  boy  with  a  telescope]  [yesterday  in  the  morning.]] 

(710  'Hie  man  on  the  hill  [saw  [the  boy  with  a  telescope]  yesterday]  in  the  morning. 

All  and  only  these  possibiliucs  arc  permitted  by  the  grammar. 

10.  Lexical  Restrictions 

Now  suppose  there  were  an  oracle  (c.g.  lexical  restrictions)  that  disambiguated  some  of  these 
possibilities.  How  could  we  incorporate  this  information  once  we  have  already  parsed  the  input  sentence  as 
above?  For  example,  the  verb  “sec"  has  two  lexical  forms,  a  predicate  of  two  arguments  as  in  “1  saw  it”  and  a 
predicate  on  three  arguments  as  in  “I  saw  it  with  a  telescope".  Now  suppose  we  had  an  oracle  which 
disambiguated  these  two  possibilities.  How  could  we  take  advantage  of  this  information? 

Consider  the  two  argument  case  first.  The  previously  assumed  VP  grammar  (72a)  simplifies  to  (72b) 
with  the  two  argument  restriction. 

(72a)  vp  -»  v  np  (PP)  ADJS 
(72b)  vp  — *  v  NP  ADIS 

If  we  rc-derivc  the  power  series  for  s,  we  obtain: 

(73)  s  =  (  N  2  Cati  (p  N)‘  +  1  )  v  N  2  (p  N)‘  2  (*  +  !)  •‘‘i1 

i  i  i 

This  equation  is  the  same  as  (66)  except  that  Cat  j+1  in  (66)  has  been  replaced  with  Cat ..  The  Cat . + 1  resulted 
from  convolving  the  PPs  generated  in  object  position  with  those  generated  in  complement  position.  Under 
the  two  argument  restriction,  it  is  no  longer  possible  to  generate  any  pps  in  complement  position,  and  hence 
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all  the  Pi's  must  be  in  object  position.  There  arc  Car  ways  to  put  them  in  object  position  as  we  have 
discussed. 

With  this  formula,  we  sec  that  three  of  the  six  parses  given  in  (71)  meet  the  two  argument  restriction. 
That  is,  there  is  still  only  one  way  to  parse  “the  man  on  the  hill”  and  three  ways  to  parse  the  adjuncts,  by  the 
same  the  reasoning  applied  previously.  However,  there  arc  now  only  Cat  t  ways  to  parse  “saw  the  boy  with  a 
telescope”  whereas  there  were  Cat2  ways  before.  'Hie  complement  interpretations  (71a-c)  have  been 
excluded  by  the  two  argument  restriction. 

Now  suppose  the  oracle  had  selected  the  three  argument  form  of  “see”.  How  could  we  take  advantage 
of  this  information?  In  this  case,  the  power  scries  for  S  is  the  difference  between  (66)  and  (73). 

(74)  S  =  (  N  2  Catj(P N)1 2 3  +  1  )  v  N  2  (Cati+i  ~  Catj) (PN)‘  2  0  +  1) adj* 

i  i  i 

Wc  hope  to  generalize  this  approach  to  handle  sclcctional  restrictions  and  agreement  facts. 

11.  Inverse  Transforms 

(Inverse  transforms  arc  a  fairly  self-contained  topic  which  can  be  left  for  a  second  reading  of  this  paper.) 

The  previous  few  sections  have  outlined  how  it  might  be  possible  to  use  formal  power  scries  to  compile 
a  grammar  into  a  form  for  more  efficient  processing.  This  section  will  discuss  the  inverse  process.  That  is, 
given  a  compiled  representation  of  the  grammar,  how  can  we  recover  a  form  suitable  for  linguistic  analysis? 
This  section  will  present  a  partial  solution  which  we  found  very  useful  for  analyzing  eqsp. 

Let  us  consider  an  anecdotal  example  based  on  our  experience  with  the  eqsp  conjunction  mechanism. 
Deep  inside  the  code,  there  was  a  function  called  syntactically-parallelp  which  decided  whether  or  not  to 
conjoin  two  constituents.  Over  the  years,  this  function  had  acquired  so  many  special  case  heuristics  that  it  was 
no  longer  understandable.  However,  wc  were  able  to  determine  the  ambiguity  coefficients  by  running  eqsp 
on  the  following  sequence  of  conjunction  sentences: 

1  It  was. 

1  It  was  actual  products. 

2  It  was  actual  products  and  actual  products. 

3  It  was  actual  products  and  actual  products  and  actual  products. 

5  It  was  actual  products  and  actual  products  and  actual  products  and  actual  products. 

8  It  was  actual  products  and  actual  products  and  actual  products  and  actual  products  and  actual 
products. 
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13  It  was  actual  products  and  actual  products  and  actual  products  and  actual  products  and  actual 
products  and  actual  products. 

21  It  was  actual  products  and  actual  products  and  actual  products  and  actual  products  and  actual 
products  and  actual  products  and  actual  products. 

To  our  surprise  the  ambiguity  coefficients  did  not  follow  the  Catalan  sequence  as  predicted,  but  rather  they 
followed  another  well-known  sequence  called  the  Fibonacci  numbers  [8j.  The  first  few  Fibonacci  numbers 
are  1, 1,  2,  3,  5,  8, 13,  21, ...  The  next  value  is  formed  by  taking  the  sum  of  the  two  previous  values,  or  more 
precisely: 

(75)  Fib  j  =  Fibfl  =  1 

Fib  =  Fib  ,  +  Fib  , 

n  n  —  1  n  -  2 

We  can  model  the  sentences  above  with  the  following  power  series  (ignoring  the  word  “and”  which 
complicates  the  analysis  in  ways  that  arc  irrelevant  to  the  current  discussion): 


(76)  S  =  It  was  2  Fib .  (actual  products)1 

i 

We  were  then  able  to  recover  the  grammar  from  the  power  series  because  the  Fibonacci  series  has  a  well- 
known  inverse  transform.  That  is,  a  power  series  with  Fibonacci  coefficients  obeys  the  following  identity. 

(77)  2  FibjX^- - 1 - j 

i  1  -  x  -  x4 

The  reader  can  verify  that  this  identity  is  correct  by  performing  the  long  division.  We  were  fortunate  in  this 
ease  that  the  inverse  transform  for  the  Fibonacci  numbers  has  a  well-known  closed  form.  In  general,  such 
closed  forms  are  very  difficult  to  discover  (if  they  exist  at  all),  and  for  this  reason,  it  can  be  very  difficult  or 
even  impossible  to  find  a  linguistically  attractive  grammar  for  an  arbitrary  processor.  Nevertheless,  closed 
forms  do  exist  for  a  large  number  of  interesting  cases.  With  some  practice  and  a  few  educated  guesses  based 
on  partial  knowledge  of  what  the  machine  is  doing,  one  can  successfully  “crack”  quite  a  number  of 
constructions.  At  least,  this  has  been  our  experience  with  EQSP. 

Returning  to  the  conjunction  sentences,  we  now  have  a  closed  form  of  the  power  series: 


(78) 


S  =  It  was - * - 5 

1  -  (actual  products)  -  (actual  products)4 
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This  has  the  following  a  i  m  realization: 


(79) 


Jump 


We  observe  that  1QSP  employs  a  heuristic  which  prevents  conjuncts  from  attaching  more  than  two  phrases 
back.  A  full  non-heuristic  conjunction  mechanism  would  permit  conjuncts  to  “fold  back”  arbitrarily  far.  In 
which  ease  the  conjunction  mechanism  would  be  a  Catalan  grammar. 

In  this  way,  we  were  able  to  perform  the  inverse  transform  on  the  ambiguity  coefficients  in  order  to 
recover  the  underlying  behavior  of  the  fx?sp  conjunction  mechanism.  We  are  now  in  a  position  to  rewrite 
syntactically-parallclp  to  be  more  comprchcndablc  and  more  efficient,  without  disturbing  the  external 
behavior. 

12.  Conclusion 

We  began  our  discussion  with  the  observation  that  certain  grammars  arc  “every  way  ambiguous”  and 
suggested  that  this  observation  could  lead  to  improved  parsing  performance.  Catalan  grammars  were  then 
introduced  to  remedy  the  situation  so  that  the  processor  could  delay  attachment  decisions  until  it  discovers 
some  more  useful  constraints.  Until  such  time,  the  processor  can  do  little  more  than  note  that  the  input 
sentence  is  “every  way  ambiguous”.  We  suggested  that  a  table  lookup  scheme  might  be  an  effective  method 
to  implement  such  a  processor. 

In  some  sense,  this  approach  is  a  formalization  of  a  very  old  idea.  That  is,  it  has  been  noticed  for  a  long 
time  that  it  might  be  advantageous  to  enrich  a  processor  with  the  capability  to  attach  certain  ambiguous 
constituents  to  several  places  in  a  single  step.  Pseudo-attachment  [1:  pp.  65-71]  and  permanent  predictable 
ambiguity  [14:  pp.  64-65]  are  two  such  proposals.  However,  these  mechanisms  have  always  lacked  a  precise 
interpretation;  Catalan  grammars  provide  a  much  more  formal  way  of  coping  with  "every  way  ambiguous” 
grammars. 

We  then  introduced  rules  for  combining  primitive  grammars,  such  as  Catalan  grammars,  into  composite 
grammars.  This  linear  systems  view  “bundles  up”  all  the  parse  trees  into  a  single  concise  description  which  is 
capable  of  telling  us  everything  we  might  want  to  know  about  the  parses,  (including  how  much  it  might  cost 
to  ask  a  particular  question).  Ibis  abstract  view  of  ambiguity  enables  us  to  ask  questions  in  the  most 
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convenient  order,  and  to  to  delay  asking  until  it  is  clear  that  that  the  pay-ofT  will  exceed  the  cost.  This 
abstraction  was  very  strongly  influenced  by  the  notion  of  delayed  binding. 

We  have  presented  combination  rules  in  three  different  representation  systems:  power  series.  A  t  Ns,  and 
context-free  grammars,  each  of  which  contributed  its  own  insights.  Power  scries  arc  convenient  for  defining 
die  algebraic  operations,  atns  arc  most  suited  for  discussing  implementation  issues,  and  context-free 
grammars  enable  the  shortest  derivations.  Perhaps  the  following  quotation  best  summaries  our  motivation  for 
alternating  among  these  dircc  representation  systems: 

(80)  “A  thing  or  idea  seems  meaningful,  only  when  we  have  several  different  ways  to  represent  it  — 
different  perspectives  and  different  associations.  Then  you  can  turn  it  around  in  your  mind,  so  to 
speak:  however  it  seems  at  the  moment,  you  can  see  it  another  way;  you  never  come  to  a  full 
stop."  [12:  p.  19] 

In  each  of  these  representation  schemes,  we  have  introduced  five  primitive  grammars:  Catalan,  Unit 
Step.  1,  and  0,  and  terminals,  and  four  composition  rules:  addition,  subtraction,  multiplication  and  division. 
We  have  seen  that  it  is  often  possible  to  employ  these  analytic  tools  in  order  to  re-organize  (compile)  the 
grammar  into  a  form  more  suitable  for  processing  efficiently.  We  have  identified  certain  situations  where  the 
ambiguity  is  combinatoric,  and  have  sketched  a  few  modifications  to  the  grammar  that  enables  processing  to 
proceed  in  a  more  efficient  manner.  In  particular,  we  have  observed  it  is  important  for  the  grammar  to  avoid 
referencing  quantities  that  are  not  easily  determined  such  as  the  dividing  point  between  a  noun  phrase  and  a 
prepositional  phrase  as  in 

(81)  Put  the  block  in  the  box  on  the  table  in  the  kitchen  ... 

*  We  have  seen  that  the  desired  re-organization  can  be  achieved  by  taking  advantage  of  the  fact  that  the  auto¬ 
convolution  of  a  Catalan  series  produces  another  Catalan  series.  This  reduced  processing  time  from  0(n3)  to 
0(n).  Similar  analyses  have  been  discussed  for  a  number  of  lexically  and  structurally  ambiguous 
constructions,  culminating  with  the  example  in  section  9  where  we  transformed  a  grammar  into  a  form  that 
could  be  parsed  by  a  single  left-to-right  pass  over  the  terminal  elements.  Currently,  these  grammar  re¬ 
formulations  have  to  be  performed  by  hand.  It  ought  to  be  possible  to  automate  this  process  so  that  the  re¬ 
formulations  could  be  performed  by  a  grammar  compiler.  We  leave  this  project  open  for  future  research. 
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